A Clustering Method of Highly Dimensional Patent Data Using Bayesian Approach
نویسنده
چکیده
Patent data have diversely technological information of any technology field. So, many companies have managed the patent data to build their R&D policy. Patent analysis is an approach to the patent management. Also, patent analysis is an important tool for technology forecasting. Patent clustering is one of the works for patent analysis. In this paper, we propose an efficient clustering method of patent documents. Generally, patent data are consisted of text document. The patent documents have a characteristic of highly dimensional structure. It is difficult to cluster the document data because of their dimensional problem. Therefore, we consider Bayesian approach to solve the problem of high dimensionality. Traditional clustering algorithms were based on similarity or distance measures, but Bayesian clustering used the probability distribution of the data. This idea of Bayesian clustering becomes a solution for the problem in this research. To verify the performance of this study, we will make experiments using retrieved patent documents from the United States Patent and Trademark Office.
منابع مشابه
Abstract: Vacant Technology Forecasting based on Patent Analysis Using an Ensemble Method and Bayesian Clustering
Patent analysis is an important approach to technology forecasting because patents are an important component of developing technology. Also, we use the results of technology forecasting to build the R&D strategies efficiently. In this paper, we consider patent clustering as one of patent analyses. That is, we cluster patent documents in order to forecast the vacant area of a given technology f...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کامل